The Journal of the Acoustical Society of America
● Acoustical Society of America (ASA)
All preprints, ranked by how well they match The Journal of the Acoustical Society of America's content profile, based on 33 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Camperos, M. J. G.; Goncalves, T. C.; Marin, B.; Pavao, R.
Show abstract
Interaural Time Difference (ITD) is the main cue for azimuthal auditory perception in humans. ITDs at each frequency contribute differently to azimuth discrimination, which can be quantified by their azimuthal Fisher Information. Consistently, human ITD discrimination thresholds are predicted by the azimuthal information. However, this prediction is poor for frequencies below 500 Hz. Such poor prediction could be ascribed to the strategy of quantifying azimuthal information using HRTFs obtained in unnaturalistic anechoic chambers or by using a direct method which does not incorporate the delay lines proposed by the Jeffress-Colburn model. In the present study, we obtained ITD discrimination thresholds from extensive sampling across frequency and ITD, and applied multiple strategies for quantifying azimuthal information. These strategies employed HRTFs obtained in realistic and anechoic chambers, with and without considering delay lines. We found that ITD discriminability thresholds across the complete range of frequencies are better predicted by azimuthal information conveyed by ITD cues when (1) we use naturalistic high-noise HRTFs, and (2) ITD delay compensation is not applied. Our results support that auditory perception is shaped by natural environments, which include high reverberation in low frequencies. Moreover, we also suggest that delay lines are not a crucial feature for determining ITD discrimination thresholds in the human auditory system. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=186 SRC="FIGDIR/small/507313v1_ufig1.gif" ALT="Figure 1"> View larger version (28K): org.highwire.dtl.DTLVardef@1540172org.highwire.dtl.DTLVardef@2aea34org.highwire.dtl.DTLVardef@175ff15org.highwire.dtl.DTLVardef@1bdf2d1_HPS_FORMAT_FIGEXP M_FIG C_FIG
Andrejkova, G.; Best, V.; Kopco, N.
Show abstract
Psychophysical experiments explored how the repeated presentation of a context, consisting of an adaptor and a target, induces plasticity in the localization of an identical target presented alone on interleaved trials. The plasticity, and its time course, was examined both in a classroom and in an anechoic chamber. Adaptors and targets were 2-ms noise clicks and listeners were tasked with localizing the targets while ignoring the adaptors (when present). The context had either a fixed temporal structure, consisting of a single-click adaptor and a target, or its structure varied from trial to trial, either containing a single-click or an 8-click adaptor. The adaptor was presented either from a frontal or a lateral location, fixed within a run. The presence of context caused responses to the isolated targets to be displaced up to 14{degrees} away from the adaptor location. This effect was stronger and slower if the context was variable, growing over the 5-minute duration of the runs. Additionally, the fixed-context buildup had a slower onset in the classroom. Overall, the results illustrate that sound localization is subject to slow adaptive processes that depend on the spatial and temporal structure of the context and on the level of reverberation in the environment.
Bidelman, G.; Eisenhut, Z.; Borowski, L.; Rizzi, R.; Pisoni, D. B.
Show abstract
PurposeSpeech perception requires that listeners classify sensory information into smaller groupings while also coping with noise that often corrupts the speech signal. The strength of categorization and speech-in-noise (SIN) abilities show stark individual differences. Some listeners perceive speech sounds in a gradient fashion, while others categorize in a discrete/binary manner, favoring fine acoustic details vs. a more abstract phonetic code, respectively. Prior work suggests SIN processing is (i) related to more gradient phonetic perception and (ii) varies with musical training. MethodTo further probe relations between perceptual gradiency and noise-degraded listening, we measured phoneme categorization, SIN recognition (QuickSIN), and sentence recognition in listeners with varying musical backgrounds. Categorization was measured for vowels and stops using standard labeling tasks. Speech recognition and discrimination were assessed using "elliptical speech" sentences that use featural substitutions which renders them meaningless under clean conditions but surprisingly improves their recognition under noise degradation. We hypothesized listeners who use broader perceptual equivalency classes in hearing elliptical speech would show better SIN perception, indicative of a more gradient listening strategy. ResultsListeners perceived elliptical sentences as sounding different than their intact counterparts in the clear but as the same under noise degradation. But this elliptical benefit varied with music background. Nonmusicians showed larger susceptibility and noise-related benefit of ellipses than musicians, consistent with the notion they used broader phonetic categories (i.e., more gradient listening). Elliptical speech perception was also associated with QuickSIN performance in both groups but in opposite ways. ConclusionsUse of broader categories was related to better SIN processing in nonmusicians but poorer SIN processing in musicians. Findings suggest listeners can use broader perceptual equivalence classes to deal with degraded listening situations but this depends critically on their auditory demographics. Nonmusicians might use broader phonetic categories to aid SIN perception while musicians might use narrower categories or otherwise similar speech contexts.
Ryan-Warden, L.; Ng, E.; Keating, P.
Show abstract
Many listening abilities become more difficult in noisy environments, particularly following hearing loss. Sound localization can be disrupted even if target sounds are clearly audible and distinct from background noise. Since subjects locate sounds by comparing the input to the two ears, sound localization is also considerably impaired by unilateral hearing loss. Currently, however, it is unclear whether the effects of unilateral hearing loss are worsened by background noise. To address this, we measured sound localization abilities in the presence or absence of broadband background noise. Adult human subjects of either sex were tested with normal hearing or with a simulated hearing loss in one ear (earplug). To isolate the role of binaural processing, we tested subjects with narrowband target sounds. Surprisingly, we found that continuous background noise improved narrowband sound localization following simulated unilateral hearing loss. By contrast, we found the opposite effect under normal hearing conditions, with background noise producing illusory shifts in sound localization. Previous attempts to model these shifts are inconsistent with behavioural and neurophysiological data. However, here we found that a simple hemispheric model of sound localization provides an explanation for our results, and provides key hypotheses for future neurophysiological studies. Overall, our results suggest that continuous background noise may be used to improve sound localization under the right circumstances. This has important implications for real-world hearing, both in normal-hearing subjects and the hearing-impaired.\n\nSignificance StatementIn noisy environments, many listening abilities become more difficult, even if target sounds are clearly audible. For example, background noise can produce illusory shifts in the perceived direction of target sounds. Because sound localization relies on the two ears working together, it is also distorted by a hearing loss in one ear. We might therefore expect background noise to worsen the effects of unilateral hearing loss. Surprisingly, we found the opposite, with background noise improving sound localization when we simulated a hearing loss in one ear. A simple hemispheric model of sound localization also helped explain the negative effects of background noise under normal hearing conditions. Overall, our results highlight the potential for using background noise to improve sound localization.
Fabio, C.; Kayser, C.
Show abstract
Numerous studies advocate for a rhythmic mode of perception. However, the evidence in the context of auditory perception remains inconsistent. We propose that the divergent conclusions drawn from previous work stem from conceptual and methodological issues. These include ambiguous assumptions regarding the origin of perceptual rhythmicity, variations in listening tasks and attentional demands, differing analytical approaches, and the reliance on fixed participant samples for statistical testing. To systematically address these points, we conducted a series of experiments in which human participants performed auditory tasks involving monaural target sounds presented against binaural white noise backgrounds, while also recording eye movements. These experiments varied in whether stimuli were presented randomly or required motor initialization by the participants, the necessity of memory across trials and the manipulation of attentional demands across modalities. Our findings challenge the notion of universal rhythmicity in hearing, but support the existence of paradigm- and ear-specific fluctuations in perceptual sensitivity and response bias that emerge at multiple frequencies. Notably, the rhythmicity for sounds in the left and right ears appears to be largely independent among participants, and the strength of rhythmicity in behavioural data is possibly linked to oculomotor activity and attentional requirements of the task. Overall, these results may help to resolve conflicting conclusions drawn in previous work and provide specific avenues for further studies into the rhythmicity of auditory perception.
Alampounti, L. C.; Rosen, S.; Cooper, H.; Bizley, J. K.
Show abstract
Investigations of the role of audiovisual integration in speech-in-noise perception have largely focused on the benefits provided by lipreading cues. Nonetheless, audiovisual temporal coherence can offer a complementary advantage in auditory selective attention tasks. We developed an audiovisual speech-in-noise test to assess the benefit of visually conveyed phonetic information and visual contributions to auditory streaming. The test was a video version of the Childrens Coordinate Response Measure with a noun as the second keyword (vCCRMn). The vCCRMn allowed us to measure speech reception thresholds in the presence of two competing talkers under three visual conditions: a full naturalistic video (AV), a video which was interrupted during the target word presentation (Inter), thus, providing no lipreading cues, and a static image of a talker with audio only (A). In each case, the video/image could display either the target talker, or one of the two competing maskers. We assessed speech reception thresholds in each visual condition in 37 young ([≤] 35 years old) normal-hearing participants. Lipreading ability was independently assessed with the Test of Adult Speechreading (TAS). Results showed that both target-coherent AV and Inter visual conditions offer participants a listening benefit over the static image audio-only condition, with the full AV target-coherent condition providing the most benefit. Lipreading ability correlated with the audiovisual benefit shown in the full AV target-coherent condition, but not the benefit in the Inter target-coherent condition. Together our results are consistent with visual information providing independent benefits to listening, through lip reading and enhanced auditory streaming.
Pinto, D.; Agmon, G.; Zion Golumbic, E.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWProcessing speech in multi-speaker environments poses substantial challenges to the human perceptual and attention system. Moreover, different contexts may require employing different listening strategies. For instance, in some cases individuals pay attention Selectively to one speaker and attempt to ignore all other task-irrelevant sounds, whereas other contexts may require listeners to Distribute their attention among several speakers. Spatial and spectral acoustic cues both play an important role in assisting listeners to segregate concurrent speakers. However, how these cues interact with varying demands for allocating top-down attention is less clear. In the current study, we test and compare how spatial cues are utilized to benefit performance on these different types of attentional tasks. To this end, participants listened to a concoction of two or four speakers, presented either as emanating from different locations in space or with no spatial separation. In separate trials, participants were required to employ different listening strategies, and detect a target-word spoken either by one pre-defined speaker (Selective Attention) or spoken by any of the speakers (Distributed Attention). Results indicate that the presence of spatial cues improved performance, particularly in the two-speaker condition, which is in line with the important role of spatial cues in stream segregation. However, spatial cues provided similar benefits to performance under Selective and Distributed attention. This pattern suggests that despite the advantage of spatial cues for stream segregation, they were nonetheless insufficient for directing a more focused attentional spotlight towards the location of a designated speaker in the Selective attention condition.
Poole, K. C.; With, S.; Martin, V.; Chait, M.; Picinali, L.; Shiell, M. M.
Show abstract
Everyday listening relies on the auditory systems ability to automatically monitor the background soundscape and detect new or changing sources. Although change detection is a fundamental aspect of situational awareness, little is known about how hearing impairment affects this ability. This study examined how sensorineural hearing loss influences spatial auditory change detection. Older hearing-impaired listeners (N = 30) completed a spatial change detection task requiring them to identify the appearance of a new sound source within a complex spatialised acoustic scene. Hearing loss was characterised by three factors that were measured with standard clinical tests: audiometric hearing thresholds, sensitivity to small level changes, and sensitivity to spectrotemporal modulation. Simple and mixed-effects linear models were used to test how these factors predicted reaction time, hit rate, and false alarm rate. Listeners with poorer spectrotemporal sensitivity, higher audiometric hearing thresholds, and older age showed slower and less accurate detection, whereas sensitivity to small changes in level did not predict outcomes. Detection also varied with spatial location, where appearing sources from behind were detected more slowly and less accurately than those from the front or sides. Numerical analysis using head-related transfer functions confirmed that these rear-field effects were unlikely to be explained by overall or frequency-specific acoustic level differences. These findings reveal that hearing loss, age, and spatial factors jointly shape listeners ability to monitor dynamic auditory scenes. Additionally, testing spectrotemporal sensitivity offers a promising clinical measure of non-speech auditory processing with relevance for hearing-aid fitting and situational awareness.
van Bentum, G. C.; Van Opstal, J.; van Wanrooij, M. M.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWSound localization and identification are challenging in acoustically rich environments. The relation between these two processes is still poorly understood. As natural sound-sources rarely occur exactly simultaneously, we wondered whether the auditory system could identify ("what") and localize ("where") two spatially separated sounds with synchronous onsets. While listeners typically report hearing a single source at an average location, one study found that both sounds may be accurately localized if listeners are explicitly being told two sources exist. We here tested whether simultaneous source identification (one vs. two) and localization is possible, by letting listeners choose to make either one or two head-orienting saccades to the perceived location(s). Results show that listeners could identify two sounds only when presented on different sides of the head, and that identification accuracy increased with their spatial separation. Notably, listeners were unable to accurately localize either sound, irrespective of whether one or two sounds were identified. Instead, the first (or only) response always landed near the average location, while second responses were unrelated to the targets. We conclude that localization of synchronous sounds in the absence of prior information is impossible. We discuss that the putative cortical what pathway may not transmit relevant information to the where pathway. We examine how a broadband interaural correlation cue could help to correctly identify the presence of two sounds without being able to localize them. We propose that the persistent averaging behavior reveals that the where system intrinsically assumes that synchronous sounds originate from a single source. SO_SCPLOWIGNIFICANCEC_SCPLOW SO_SCPLOWTATEMENTC_SCPLOWIt is poorly understood whether identification ( what) of sounds and their localization ( where) are inter-related, or independent neural processes. We measured sound-localization responses towards synchronous sounds to examine potential coupling of these processes. We varied the spatial configurations of two sounds and found that although identification improved considerably with larger spatial separation, their localization was unaffected: responses were always directed towards the average location. This shows absence of mutual coupling of information between the what and where streams in the auditory system. We also show how broadband interaural correlation could explain the improved identification results, without affecting localization performance, and explain how the persistent spatial averaging could be understood from strong internal priors regarding sound synchronicity.
Shan, T.; Cappelloni, M. S.; Maddox, R. K.
Show abstract
Music and speech are two sounds that are unique to human beings and encountered in daily life. Both are transformed by the auditory pathway from an initial acoustical encoding to higher level cognition. Most studies of speech and music processing are focused on the cortex, and the subcortical response to natural, polyphonic music is essentially unstudied. This study was aimed to compare the subcortical encoding of music and speech using the auditory brainstem response (ABR). While several methods have recently been developed to derive the ABR to continuous speech, they are either not applicable to music or give poor results. In this study, we explored deriving the ABR through deconvolution using three regressors: 1) the half-wave rectified stimulus waveform, 2) the modeled inner hair cell potential, and 3) the auditory nerve model firing rate (ANM), where the latter two were generated from a computational auditory periphery model. We found the ANM regressor yields robust and interpretable ABR waveforms to diverse genres of music and multiple types of speech. We then used the ANM-derived ABRs to compare the subcortical responses to music and speech and found that they are highly similar in morphology. We further investigated cortical responses using the same deconvolution method, and found the responses there were also quite similar, which was unexpected based on previous studies. We conclude that when using our proposed deconvolution regressor that accounts for acoustical differences nonlinear effects on peripheral encoding, the derived brainstem and cortical responses to music and speech are highly correlated.
Jüchter, C.; Beutelmann, R.; Klump, G. M.
Show abstract
Exposure to loud sounds can lead to hearing impairments. Speech comprehension, especially in the presence of background sounds, allegedly declines as a consequence of noise-induced hearing loss. However, the connection between noise overexposure and deteriorated speech-in-noise perception is not clear yet and potential underlying mechanisms are still under debate. This study investigates speech-in-noise discrimination in young-adult Mongolian gerbils before and after an acoustic trauma to reveal possible noise-induced changes in the perception of speech sounds and to examine the commonly suggested link between noise exposure and speech-in-noise perception difficulties. Nine young-adult gerbils were trained to discriminate a deviant consonant-vowel-consonant combination (CVC) or vowel-consonant-vowel combination (VCV) in a sequence of CVC or VCV standards, respectively. The logatomes were spoken by different speakers and masked by a steady-state speech-shaped noise. After the gerbils completed the behavioral baseline experiments, they underwent an acoustic trauma and collected data for the second time in the behavioral experiments. Applying multidimensional scaling, response latencies were used to generate perceptual maps reflecting the gerbils internal representations of the sounds pre- and post-trauma. To evaluate how the discrimination of vowels and consonants was altered after the acoustic trauma, changes in response latencies between phoneme pairs were investigated with regard to their articulatory features. Auditory brainstem responses were measured to assess peripheral auditory function. We found that the perceptual maps of vowels and consonants were very similar before and after noise exposure. Interestingly, the gerbils overall vowel discrimination ability was improved after the acoustic trauma, even though the gerbils suffered from noise-induced hearing loss. In contrast to the improvements in vowel discrimination, there were only minor changes in the gerbils ability to discriminate consonants. Moreover, the noise exposure showed a differential influence on the response latencies for vowel and consonant discriminations depending on the articulatory features of the specific phonemes.
Agarwalla, S.; Farhadi, A.; Carney, L. H.
Show abstract
The role of medial olivocochlear (MOC) efferent gain control in auditory enhancement (AE) was investigated using a subcortical auditory model. AE refers to the influence of a precursor on detectability of targets. The absence (or presence) of a precursor component at the target frequency enhances (or suppresses) detection under simultaneous masking conditions. Furthermore, the enhanced target under simultaneous masking acts as a stronger forward masker for a delayed probe tone, known as AE under forward masking. Psychoacoustic studies of AE report findings that challenge conventional expectations, and the underlying mechanisms remain unclear. For instance, listeners with hearing impairment have AE under simultaneous masking but not forward masking (Kreft et al., 2018; Kreft and Oxenham, 2019), whereas listeners with normal hearing have level-dependent AE under forward masking (Kreft and Oxenham, 2019). Our model with MOC efferent gain control successfully replicated these findings. In contrast, a model without efferent gain control failed to capture these effects, supporting the hypothesis that MOC-mediated cochlear gain modulation may play a role in AE and its alteration by hearing loss.
Sergeeva, A.; Kidmose, P.
Show abstract
Auditory masking is important in the characterization of human hearing and hearing impairment. Traditionally, masking is assessed through behavioral methods, witch requires active participant engagement. This study investigates the potential of using Auditory Steady-State Response (ASSR) to assess auditory masking, enabling masking assessment without requiring active participation. ASSRs were measured in response to a 40-Hz amplitude-modulated probe signal with and without the presence of a masker. The probe signals were 1/3-octave band-width Gaussian noise centered at 891 and 1414 Hz (center frequency, CF) and presented at 10, 20, 30, and 40 dB above individual behavioral masking thresholds (MT). The masker was lowpass Gaussian noise (cut-off 707 Hz) presented at 65 and 85 dB SPL (masker level, ML). The ASSR amplitude increased with presentation level (PL) and decreased in the presence of a masker, confirming a masking effect on ASSR. At 65 dB ML, ASSRs did not differ between center frequencies when probe signals were presented relative to MT, suggesting a simple relationship between MT and ASSR. At 85 dB ML, an effect of CF was observed, suggesting that the relationship between MT and ASSR is more complex than initially anticipated, and involving all the experimental parameters (CF, PL, and ML).
McLachlan, G. A.; Majdak, P.; Reijniers, J.; Mihocic, M.; Peremans, H.
Show abstract
Self-motion is an essential but often overlooked component of sound localisation. While the directional information of a source is implicitly contained in head-centred acoustic cues, that acoustic input needs to be continuously combined with sensorimotor information about the head orientation in order to decode these cues to a world-centred frame of reference. On top of that, the use of head movement significantly reduces ambiguities in the directional information provided by the incoming sound. In this work, we evaluate a Bayesian model that predicts dynamic sound localisation, by comparing its predictions to human performance measured in a behavioural sound-localisation experiment. Model parameters were set a-priori, based on results from various psychoacoustic and sensorimotor studies, i.e., without any post-hoc parameter fitting to behavioral results. In a spatial analysis, we evaluated the models capability to predict spatial localisation responses. Further, we investigated specific effects of the stimulus duration, the spatial prior and sizes of various model uncertainties on the predictions. The spatial analysis revealed general agreement between the predictions and the actual behaviour. The altering of the model uncertainties and stimulus duration revealed a number of interesting effects providing new insights on modelling the human integration of acoustic and sensorimotor information in a localisation task. Author summaryIn everyday life, sound localisation requires both interaural and monaural acoustic information. In addition to this, sensorimotor information about the position of the head is required to create a stable and accurate representation of our acoustic environment. Bayesian inference is an effective mathematical framework to model how humans combine information from different sources and form beliefs about the world. Here, we compare the predictions from a Bayesian model for dynamic sound localisation with data from a localisation experiment. We show that we can derive the model parameter values from previous psychoacoustic and sensorimotor experiments and that the model without any post-hoc fitting, can predict general dynamic localisation performance. Finally, the discrepancies between the modelled data and behavioural data are analysed by testing the effects of adjusting the model parameters.
Kim, H.; Ratkute, V.; Epp, B.
Show abstract
Auditory stream segregation can be facilitated when the maskers share coherent amplitude modulations or by utilizing spatial cues. The effectiveness of each cue can be quantified as a decrease in masked thresholds, termed as comodulated masking release (CMR) and binaural masking level difference (BMLD). Prolonged exposure to the masker can influence following target segregation. However, the collective impact preceding noise on target segregation in the presence of comodulation and interaural phase difference (IPD) cues is unclear. Stimuli were used to induce noise streams by altering the duration and temporal coherence of the envelope of the preceding masker. The effect on following target detection with CMR and BMLD induced by an IPD of the target tone was investigated. The results indicate that the effect of the preceding stream formation on CMR operates on different time scales, extending beyond 200 ms, depending on the spectrotemporal characteristics of the maskers. However, the effect on IPD-induced BMLD was not significant across time. Under the simplifying assumption that peripheral processing operates on shorter time scales than cortical processing, the results of the present study may provide insights into auditory signal processing in the presence of beneficial cues for target stream segregation.
Williams, J. D.
Show abstract
A representative audio file for Blue Jay (Cyanocitta cristata), that is thought to be most similar to suspected kent sounds of the Ivory-billed Woodpecker (Campephilus principalis), was examined at the spectrogram level. A total of 136 other Blue Jay files were looked at and compared to kents (n>200) from six Ivory-billed Woodpecker expeditions. At this precision, differences are seen such that these two species cannot be mistaken for each other.
Wetekam, J.; Hechavarria, J. C.; Lopez-Jury, L.; Gonzalez-Palomares, E.; Koessl, M.
Show abstract
Deviance detection describes an increase of neural response strength caused by a stimulus with a low probability of occurrence. This ubiquitous phenomenon has been reported for multiple species, from subthalamic areas to auditory cortex. While cortical deviance detection has been well characterised by a range of studies covering neural activity at population level (mismatch negativity, MMN) as well as at cellular level (stimulus-specific adaptation, SSA), subcortical deviance detection has been studied mainly on cellular level in the form of SSA. Here, we aim to bridge this gap by using noninvasively recorded auditory brainstem responses (ABRs) to investigate deviance detection at population level in the lower stations of the auditory system of a hearing specialist: the bat Carollia perspicillata. Our present approach uses behaviourally relevant vocalisation stimuli that are closer to the animals natural soundscape than artificial stimuli used in previous studies that focussed on subcortical areas. We show that deviance detection in ABRs is significantly stronger for echolocation pulses than for social communication calls or artificial sounds, indicating that subthalamic deviance detection depends on the behavioural meaning of a stimulus. Additionally, complex physical sound features like frequency- and amplitude-modulation affected the strength of deviance detection in the ABR. In summary, our results suggest that at population level, the bat brain can detect different types of deviants already in the brainstem. This shows that subthalamic brain structures exhibit more advanced forms of deviance detection than previously known.
McHaney, J. R.; Hancock, K. E.; Polley, D. B.; Parthasarathy, A.
Show abstract
Optimal speech perception in noise requires successful separation of the target speech stream from multiple competing background speech streams. The ability to segregate these competing speech streams depends on the fidelity of bottom-up neural representations of sensory information in the auditory system and top-down influences of effortful listening. Here, we use objective neurophysiological measures of bottom-up temporal processing using envelope-following responses (EFRs) to amplitude modulated tones and investigate their interactions with pupil-indexed listening effort, as it relates to performance on the Quick speech in noise (QuickSIN) test in young adult listeners with clinically normal hearing thresholds. We developed an approach using ear-canal electrodes and adjusting electrode montages for modulation rate ranges, which extended the rage of reliable EFR measurements as high as 1024Hz. Pupillary responses revealed changes in listening effort at the two most difficult signal-to-noise ratios (SNR), but behavioral deficits at the hardest SNR only. Neither pupil-indexed listening effort nor the slope of the EFR decay function independently related to QuickSIN performance. However, a linear model using the combination of EFRs and pupil metrics significantly explained variance in QuickSIN performance. These results suggest a synergistic interaction between bottom-up sensory coding and top-down measures of listening effort as it relates to speech perception in noise. These findings can inform the development of next-generation tests for hearing deficits in listeners with normal-hearing thresholds that incorporates a multi-dimensional approach to understanding speech intelligibility deficits.
Losorelli, S.; Kaneshiro, B.; Musacchia, G. A.; Blevins, N. H.; Fitzgerald, M. B.
Show abstract
The ability to differentiate complex sounds is essential for communication. Here, we propose using a machine-learning approach, called classification, to objectively evaluate auditory perception. In this study, we recorded frequency following responses (FFRs) from 13 normal-hearing adult participants to six short music and speech stimuli sharing similar fundamental frequencies but varying in overall spectral and temporal characteristics. Each participant completed a perceptual identification test using the same stimuli. We used linear discriminant analysis to classify FFRs. Results showed statistically significant FFR classification accuracies using both the full response epoch in the time domain (72.3% accuracy, p < 0.001) as well as real and imaginary Fourier coefficients up to 1 kHz (74.6%, p < 0.001). We classified decomposed versions of the responses in order to examine which response features contributed to successful decoding. Classifier accuracies using Fourier magnitude and phase alone in the same frequency range were lower but still significant (58.2% and 41.3% respectively, p < 0.001). Classification of overlapping 20-msec subsets of the FFR in the time domain similarly produced reduced but significant accuracies (42.3%-62.8%, p < 0.001). Participants mean perceptual responses were most accurate (90.6%, p < 0.001). Confusion matrices from FFR classifications and perceptual responses were converted to distance matrices and visualized as dendrograms. FFR classifications and perceptual responses demonstrate similar patterns of confusion across the stimuli. Our results demonstrate that classification can differentiate auditory stimuli from FFR responses with high accuracy. Moreover, the reduced accuracies obtained when the FFR is decomposed in the time and frequency domains suggest that different response features contribute complementary information, similar to how the human auditory system is thought to rely on both timing and frequency information to accurately process sound. Taken together, these results suggest that FFR classification is a promising approach for objective assessment of auditory perception.
Begus, G.; Holt, M.; Wright, B.; Gruber, D. F.
Show abstract
The vocal communication system of orcas (Orcinus orca) has so far been analyzed primarily in terms of the fundamental frequency (F0) modulations, i.e. the frequency of their phonic lips vibration. The calls have been divided into clicks, pulsed calls, whistles and types thereof. By analyzing 61 hours of on-orca acoustic recordings and controlling for the effect of high-frequency components (HFC) and F0, we report structured formant patterns in orca vocalizations including diphthongal trajectories. Broadband spectrogram analysis reveals previously unreported formant patterns that appear independent of F0 and HFC and are hypothesized to result from air sac resonances. This study builds on the recent report of formant structure in vowel- and diphthong-like calls in another cetacean, sperm whales (Physeter macrocephalus). Using linguistic techniques, we further demonstrate that some calls are reminiscent of human consonant-vowel sequences, featuring bursts or abrupt decreases in amplitude. We also show that individual sparsely distributed clicks gradually transition into high frequency tonal calls, which aligns with analysis of sperm whale codas as vocalic pulses. The paper makes methodological contributions to the cetacean communication research by analyzing orca vocalizations with both narrowband and broadband spectrograms. The reported patterns are hypothesized to be actively controlled by whales and may carry communicative information. The spectral patterns shown in this study provide an added dimension to the orca communication system that merits further analysis and demonstrates convergent evolutions of similar phonological features in cetaceans (orca and sperm whale) and human communication systems.